PostgreSQL Waffles!

FOSDEM
2021-02-06

Image by Pezibear from Pixabay

Who am I

  • Lætitia Avrot
  • mydbanotebook.org
  • @l_avrot
  • #PostgresWomen co-founder
  • PostgreSQL Europe Treasurer
  • EDB Senior Database Consultant
Image by Anemone123 from Pixabay

Waffles

  • Brussels, FOSDEM and food
  • What is an extension
  • How to create one
  • Practical example with waffles recipes
Image by Herbich from Pixabay

What's an extension ?

    A package to hold several database objects. Object can be:
  • schemas, tables, views
  • functions, procedures
  • custom datatypes, custom domains
  • ...
Image by Tim Hill from Pixabay

Why use them ?

A useful extension to PostgreSQL typically includes multiple SQL objects; for example, a new data type will require new functions, new operators, and probably new index operator classes. It is helpful to collect all these objects into a single package to simplify database management. PostgreSQL documentation
Image by Gerd Altmann from Pixabay

PostgreSQL Extensions: A History

Image by congerdesign from Pixabay
Our second goal is to allow new data types, new operators and new access methods to be included in the DBMS. Moreover, it is crucial that they be implementable by non-experts which means easy-to-use interfaces should be preserved for any code that will be written by a user.

M. Stonebraeker and L. Rowe, The design of Postgres, May 1986
Image by congerdesign from Pixabay
  • 2005, Postgres 8.0: Makefile framework (Fabien Coelho, Peter)
  • 2010: PGXN creation (David Wheeler)
  • 2011, Postgres 9.1: create/alter/drop extension (Dimitri Fontaine, Tom Lane)
  • 2016, Postgres 9.6: Add cascade option to create extension (Petr Jelínek)
Image by congerdesign from Pixabay

Extension examples

  • postgis
  • mobilityDB
  • pg_stat_statements
  • auto_explain
  • pgcrypto
  • pg_prewarm

Extension examples

  • citus
  • unaccent
  • zombodb
  • FDW (foreign data wrappers)
  • ...

We want... waffles!

    We will create an extension that:
  1. Searches "Belgian waffles" in allrecipes.com
  2. Retrieves the first recipe with ingredients and steps
  3. Stores everything in a relational data model
  4. Displays the steps for the "best" (highest ranking) recipe
Image by congerdesign from Pixabay

Postgres extension recipe

  1. Find a great name
  2. Create your source code
  3. Create your makefile
  4. Test it
  5. Register your extension under PGXN
Image by congerdesign from Pixabay

1. Find a great name!

pgwaffles
Subsequent characters in an identifier or key word can be letters, underscores, digits (0-9), or dollar signs ($).PostgreSQL Documentation
  • No hyphen
  • Underscores accepted

2. Source code!

  1. Get the results of the allrecipes search
  2. Parse the results to find the links
  3. Open the first link and parse the recipes
  4. Store everything
  5. Display the recipe
create temporary table allRecipes (html text);

copy allRecipes from program 'curl -s
  "{url}"
  | hxnormalize -x | hxselect -i
    "{tag selector}" | tr "\n" " "'; 
  • Use of curl to get the source code of the page
  • Use of html-xml-utils tools from w3c to clean the said source code and isolate the links
  • Use of tr to remove newlines
laetitia=# select * from allrecipes ;

                                  html
----------------------------------------------
 
{...}
   
  (1 row) 
 
 with rawHtml(parts) as (
  /* Let's get the url links at the begining */
  select regexp_split_to_table(html, 'href="')
  from allRecipes
)
  /* Let's stop at the end of the link's url */
  select split_part(parts, '/">', 1)
  from rawHtml
  /* The first part does not contain a real link */
  where rawHtml.parts !~ '^{html selector}'
;
          
  • Use of regexp_split_to_table
  • Use of split_parts
  • Use of with list element
                split_part
-----------------------------------------------------
 {url}/213251/liege-belgian-waffles-with-pearl-sugar
 {url}/17681/belgian-waffles
 {url}/88309/whole-grain-waffles
 {url}/20513/classic-waffles
 {url}/22180/waffles-i
 {url}/246198/easy-french-toast-waffles
 {url}/220890/potato-waffles
 {url}/268309/cream-puff-crack-buns-choux-au-craquelin
 {url}/246917/sams-sourdough-waffles
 {url}/240386/cinnamon-roll-waffles
 {url}/233920/tender-and-easy-buttermilk-waffles
 {url}/171978/great-easy-waffles
 {url}/261650/over-the-top-smores-waffles
 {url}/237196/whole-wheat-coconut-oil-waffles

Let's look at the first link's html source code!

/((((((\\\\
=======((((((((((\\\\\
     ((           \\\\\\\
     ( (*    _/      \\\\\\\
       \    /  \      \\\\\\________________
        |  |   |       </                  ((\\\\
        o_|   /        /                      \ \\\\    \\\\\\\
             |  ._    (                        \ \\\\\\\\\\\\\\\\
             | /                       /       /    \\\\\\\     \\
     .______/\/     /                 /       /         \\\
    / __.____/    _/         ________(       /\
   / / / ________/`_________'         \     /  \_
  / /  \ \                             \   \ \_  \
 ( <    \ \                             >  /    \ \
  \/     \\_                           / /       > )
          \_|                         / /       / /
                                    _//       _//
                                   /_|       /_| 

Let's look at the first link's html source code!

 

Let's get this json and parse it!

create temporary table processedRecipe(
  data text,
  recipe jsonb,
  title text generated always as
      ((recipe -> 'name')::text) stored,
{...}
);
  • Use of generated columns
  • A generated column can't rely on another generated column 😞

Let's get this json and parse it!

 {...}
  image text generated always as (
      ((recipe -> 'image')::jsonb -> 'url')::text
    ) stored,
  description text generated always as (
      (recipe -> 'description')::text
        ) stored,
  yield integer generated always as (
      regexp_replace((
            regexp_split_to_array(
                    (recipe -> 'recipeYield')::text,' '))[1],
                '"',
                '')::integer
        ) stored,
{...}
  • Use of the json's -> operator

Let's get this json and parse it!

 {...}
  ingredients jsonb generated always as (
      recipe -> 'recipeIngredient') stored,
  steps jsonb generated always as (
      recipe -> 'recipeInstructions') stored,
  keywords text[],
  rating double precision  generated always as (
      ((recipe -> 'aggregateRating')::jsonb
           -> 'ratingValue')::float
    ) stored
);
  • Use the jsonb datatype instead of json for performance reasons

Let's get this json and parse it!

copy processedRecipe (data)
  from program 
    'curl -s "{url}" | 
      hxnormalize -x | 
      hxselect -ci "script[type=application\/ld\+json]" |
      tr -d "\n" | 
      sed "s=\\\\=\\\\\\\\=g"';

We now have a huge table of jsons inside the "data" column!

Sadly, this is not a valid json array for Postgres, so I was not able to use a jsonb[] datatype for that column

Let's get this json and parse it!

          update processedRecipe set recipe = (regexp_match(
    data,
    '{ *"@context": "http://schema.org", *"@type": "Recipe",.*}')
        )[1]::jsonb,
  keywords = array['belgium', 'waffle'];

And we're good!

Let's get this json and parse it!

select title,
   description,
   yield,
   ingredients,
   steps,
   keywords,
   rating
 from processedRecipe;
          --------------------------------------------------------------------------------------------------------------------------------------------------------
title       | "Liege Belgian Waffles with Pearl Sugar"
description | "These are sweeter than traditional Brussels Belgian waffles - a sweeter waffle that can be served for breakfast or dessert. Serve waffles warm or cooled with whipped cream and berries or drizzled with warm, melted chocolate."
yield       | 10
ingredients | ["1 (.25 ounce) package active dry yeast", "1 ½ tablespoons white sugar", "¾ cup lukewarm milk", "3 eggs", "1 cup melted butter", "2 teaspoons vanilla extract", "3 cups flour", "½ teaspoon salt", "1 ½ cups pearl sugar (such as Lars' Own®)"]
steps       | [{"text": "Sprinkle the yeast and white sugar over warm milk in a small bowl. The milk should be no more than 100 degrees F (40 degrees C). Let stand for 15 minutes until the yeast softens and begins to form a creamy foam.\n", "@type": "HowToStep"}, {"text": "Whisk the eggs, melted butter, and vanilla extract into the yeast mixture until evenly blended; set aside. Stir together the flour and salt in a separate large bowl, and make a well in the center. Pour the egg mixture into the well, then stir in the flour mixture until a soft dough forms. Cover with a light cloth and let rise in a warm place (80 to 95 degrees F (27 to 35 degrees C)) until doubled in volume, about 30 minutes. Gently mix in the pearl sugar.\n", "@type": "HowToStep"}, {"text": "Preheat a waffle iron according to manufacturer's instructions.\n", "@type": "HowToStep"}, {"text": "Place a baseball-size ball of dough on the preheated waffle iron. Cook waffles until golden and crisp, about 2 minutes. Repeat with remaining dough. Allow waffles to cool for 2 to 3 minutes before serving.\n", "@type": "HowToStep"}]
keywords    | {belgium,waffle}
rating      | 4.738255033557047

Data model

Data model graphic made with mocodo

Data model

create table ingredient (
  id integer generated always as identity primary key,
  name text unique not null
);
create table recipe (
  id integer generated always as identity primary key,
  title text unique not null,
  picture text,
  description text,
  keywords text[] not null,
  rating float
);

Data model

create table step (
  recipeId integer not null references recipe (id),
  ordinality integer not null,
  description text not null,
  primary key (recipeId, ordinality)
);
create table ingredientInRecipe (
  recipeId integer not null references recipe (id),
  yield integer not null,
  ingredientId integer not null references ingredient (id),
  quantity decimal(5,3) not null,
  unit text not null
);

<0,N>-<0,N> relations lead to a new table

Inserting into my tables

  • One single query to insert into the 4 tables
  • Use of returning clause
  • Heavy use of with list elements
  • 110 lines of SQL code
 with insertIntoRecipe(id) as (
  insert into pgwaffles.recipe(title,
  picture,
  description,
  keywords,
  rating)
  (select title,
    image,
      description,
      keywords,
      rating
  from processedRecipe)
  on conflict (title) do update set title = excluded.title
  returning id
), 
 ingredients (data) as (
  select
    /* we'll remove parentethis and what's inside them as it can cause
     * problems and those details shouldn't be necessary */
     regexp_replace(jsonb_array_elements_text(ingredients),
   ' *\([^\)]+\) *',
   ' '
   )
  from processedRecipe
), 
 processedIngredients(data) as (
  select
    regexp_split_to_array (
      /* we'll change those horrible fractions into real numbers */
      case
        when data ~ ' ½' then regexp_replace(data, ' ½','.5')
        when data ~ '½' then regexp_replace(data, '½','0.5')
        when data ~ ' ⅓' then regexp_replace(data, ' ⅓','.3')
        when data ~ '⅓' then regexp_replace(data, '⅓','0.3')
        when data ~ ' ⅕' then regexp_replace(data, ' ⅕','.2')
        when data ~ '⅕' then regexp_replace(data, '⅕','0.2')
        when data ~ ' ⅙' then regexp_replace(data, ' ⅙','.17')
        when data ~ '⅙' then regexp_replace(data, '⅙','0.17')
        when data ~ ' ⅛' then regexp_replace(data, ' ⅛','.125')
        when data ~ '⅛' then regexp_replace(data, '⅛','0.125')
        when data ~ ' ⅔' then regexp_replace(data, ' ⅔','.7')
        when data ~ '⅔' then regexp_replace(data, '⅔','0.7')
        when data ~ ' ⅖' then regexp_replace(data, ' ⅖','.4')
        when data ~ '⅖' then regexp_replace(data, '⅖','0.4')
        when data ~ ' ⅚' then regexp_replace(data, ' ⅚','.8')
        when data ~ '⅚' then regexp_replace(data, '⅚','0.8')
        when data ~ ' ⅜' then regexp_replace(data, ' ⅜','.375')
        when data ~ '⅜' then regexp_replace(data, '⅜','0.375')
        when data ~ ' ¾' then regexp_replace(data, ' ¾','.75')
        when data ~ '¾' then regexp_replace(data, '¾','0.75')
        when data ~ ' ⅗' then regexp_replace(data, ' ⅗','.6')
        when data ~ '⅗' then regexp_replace(data, '⅗','0.6')
        when data ~ ' ⅝' then regexp_replace(data, ' ⅝','.625')
        when data ~ '⅝' then regexp_replace(data, '⅝','0.625')
        when data ~ ' ⅞' then regexp_replace(data, ' ⅞','.875')
        when data ~ ' ⅞' then regexp_replace(data, '⅞','.875')
        when data ~ '⅘' then regexp_replace(data, ' ⅘','.8')
        when data ~ ' ⅘' then regexp_replace(data, '⅘','0.8')
        when data ~ '¼' then regexp_replace(data, ' ¼','.25')
        when data ~ ' ¼' then regexp_replace(data, '¼','0.25')
        when data ~ '⅐' then regexp_replace(data, ' ⅐','.14')
        when data ~ ' ⅐' then regexp_replace(data, '⅐','0.14')
        when data ~ '⅑' then regexp_replace(data, ' ⅑','.11')
        when data ~ ' ⅑' then regexp_replace(data, '⅑','0.11')
        when data ~ '⅒' then regexp_replace(data, ' ⅒','.1')
        when data ~ ' ⅒' then regexp_replace(data, '⅒','0.1')
        else data
      end,
      ' ')
  from ingredients
), 
 insertIntoIngredient(id) as (
  insert into pgwaffles.ingredient (name)
  (select lower(array_to_string(array_remove(array_remove(data,data[1]),data[2]),' '))
  from processedIngredients
  )
  on conflict (name) do update set name = excluded.name
  returning id, name
), 
 insertIntoIngredientInRecipe(id) as (
  insert into pgwaffles.ingredientInRecipe(
    ingredientId,
    recipeId,
    quantity,
    yield,
    unit
  )
  (select insertIntoIngredient.id,
    insertIntoRecipe.id,
    processedIngredients.data[1]::decimal(5,3) as quantity,
    processedRecipe.yield,
    processedIngredients.data[2] as unit
    from insertIntoIngredient
      inner join processedIngredients
        on insertIntoIngredient.name = lower(array_to_string(array_remove(array_remove(processedIngredients.data,processedIngredients.data[1]),data[2]),' ')),
      insertIntoRecipe,
      processedRecipe
  )
  returning recipeId
) 
 insert into pgwaffles.step(recipeId,
  ordinality,
  description)
 (select distinct insertIntoIngredientInRecipe.id,
   steps_with_ordinality.ordinality,
   steps_with_ordinality."text"
 from processedRecipe,
  insertIntoIngredientInRecipe,
  rows from (jsonb_to_recordset(steps) as ("text" text,"@type" text))
    with ordinality steps_with_ordinality)
; 

Display the recipe

with bestRecipe(recipeId) as (
  select id
  from recipe
  order by rating
  limit 1
)
select
  ingredientInRecipe.yield,
  ingredientInRecipe.quantity,
  ingredientInRecipe.unit,
  ingredient.name
from ingredientInRecipe
  inner join bestRecipe
    on ingredientInRecipe.recipeId = bestRecipe.recipeId
  inner join ingredient
    on ingredientInRecipe.ingredientId = ingredient.id

Display the recipe

           yield | quantity |    unit     |       name       
-------+----------+-------------+------------------
    10 |    3.000 | eggs        | 
    10 |    1.000 | package     | active dry yeast
    10 |    3.000 | cups        | flour
    10 |    0.750 | cup         | lukewarm milk
    10 |    1.000 | cup         | melted butter
    10 |    1.500 | cups        | pearl sugar 
    10 |    0.500 | teaspoon    | salt
    10 |    2.000 | teaspoons   | vanilla extract
    10 |    1.500 | tablespoons | white sugar
(9 rows) 

Display the recipe

with bestRecipe(recipeId) as (
  select id
  from recipe
  order by rating
  limit 1
)
select description
from step
  inner join bestRecipe
    on step.recipeId = bestRecipe.recipeId
order by ordinality

Display the recipe

          -[ RECORD 1 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
description | Sprinkle the yeast and white sugar over warm milk in a small bowl. The milk should be no more than 100 degrees F (40 degrees C). Let stand for 15 minutes until the yeast softens and begins to form a creamy foam.                                                                                                                                                                                                                                                        +
            | 
-[ RECORD 2 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
description | Whisk the eggs, melted butter, and vanilla extract into the yeast mixture until evenly blended; set aside. Stir together the flour and salt in a separate large bowl, and make a well in the center. Pour the egg mixture into the well, then stir in the flour mixture until a soft dough forms. Cover with a light cloth and let rise in a warm place (80 to 95 degrees F (27 to 35 degrees C)) until doubled in volume, about 30 minutes. Gently mix in the pearl sugar.+
            | 
-[ RECORD 3 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
description | Preheat a waffle iron according to manufacturer's instructions.                                                                                                                                                                                                                                                                                                                                                                                                            +
            | 
-[ RECORD 4 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
description | Place a baseball-size ball of dough on the preheated waffle iron. Cook waffles until golden and crisp, about 2 minutes. Repeat with remaining dough. Allow waffles to cool for 2 to 3 minutes before serving.                                                                                                                                                                                                                                                              +
            | 

Put it all together

Put all your sql code inside a file named after your extension with versioning

pgwaffles--1.sql

--complain if script is sourced in psql, 
--rather than via CREATE EXTENSION
\echo Use "CREATE EXTENSION pgwaffles" to load this file.
\quit

3. Makefile

EXTENSION = pgwaffles
DATA = pgwaffles--1.sql

PG_CONFIG = pg_config
PGXS := $(shell $(PG_CONFIG) --pgxs)
include $(PGXS)
Image by Free-Photos from Pixabay

The makefile will need a control file

# postgreswaffles extension
# comment = 'Displays the best waffles recipe regarding allrecipes.com'
default_version = '1'
module_pathname = '$libdir/pgwaffles'
relocatable = false
Image by Free-Photos from Pixabay

Installing the extension

sudo make install
/bin/sh /usr/local/pgsql/lib/pgxs/src/makefiles/../../config/install-sh -c -d '/usr/local/pgsql/share/extension'
/bin/sh /usr/local/pgsql/lib/pgxs/src/makefiles/../../config/install-sh -c -d '/usr/local/pgsql/share/extension'
/usr/bin/install -c -m 644 .//pgwaffles.control '/usr/local/pgsql/share/extension/'
/usr/bin/install -c -m 644 .//pgwaffles--1.sql  '/usr/local/pgsql/share/extension/'

5. Testing!

laetitia=# create extension pgwaffles;
CREATE EXTENSION
laetitia=# \dn
   List of schemas
   Name    |  Owner   
-----------+----------
 pgwaffles | laetitia
 public    | postgres
(2 rows)

5. Testing!

laetitia=# \dt pgwaffles.*
                 List of relations
  Schema   |        Name        | Type  |  Owner   
-----------+--------------------+-------+----------
 pgwaffles | ingredient         | table | laetitia
 pgwaffles | ingredientinrecipe | table | laetitia
 pgwaffles | recipe             | table | laetitia
 pgwaffles | step               | table | laetitia
(4 rows)

laetitia=# \df pgwaffles.*
                                                        List of functions
  Schema   |        Name        | Result data type |                          Argument data types                          | Type 
-----------+--------------------+------------------+-----------------------------------------------------------------------+------
 pgwaffles | displayingredients | SETOF record     | OUT yield integer, OUT quantity numeric, OUT unit text, OUT name text | func
 pgwaffles | displayrecipe      | SETOF text       |                                                                       | func
(2 rows)

5. Testing!

          laetitia=# select * from pgwaffles.displayingredients();
 yield | quantity |    unit     |       name       
-------+----------+-------------+------------------
    10 |    3.000 | eggs        | 
    10 |    1.000 | package     | active dry yeast
    10 |    3.000 | cups        | flour
    10 |    0.750 | cup         | lukewarm milk
    10 |    1.000 | cup         | melted butter
    10 |    1.500 | cups        | pearl sugar 
    10 |    0.500 | teaspoon    | salt
    10 |    2.000 | teaspoons   | vanilla extract
    10 |    1.500 | tablespoons | white sugar
(9 rows) 

5. Testing!

          -[ RECORD 1 ]-+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
displayrecipe | Sprinkle the yeast and white sugar over warm milk in a small bowl. The milk should be no more than 100 degrees F (40 degrees C). Let stand for 15 minutes until the yeast softens and begins to form a creamy foam.                                                                                                                                                                                                                                                        +
              | 
-[ RECORD 2 ]-+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
displayrecipe | Whisk the eggs, melted butter, and vanilla extract into the yeast mixture until evenly blended; set aside. Stir together the flour and salt in a separate large bowl, and make a well in the center. Pour the egg mixture into the well, then stir in the flour mixture until a soft dough forms. Cover with a light cloth and let rise in a warm place (80 to 95 degrees F (27 to 35 degrees C)) until doubled in volume, about 30 minutes. Gently mix in the pearl sugar.+
              | 
-[ RECORD 3 ]-+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
displayrecipe | Preheat a waffle iron according to manufacturer's instructions.                                                                                                                                                                                                                                                                                                                                                                                                            +
              | 
-[ RECORD 4 ]-+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
displayrecipe | Place a baseball-size ball of dough on the preheated waffle iron. Cook waffles until golden and crisp, about 2 minutes. Repeat with remaining dough. Allow waffles to cool for 2 to 3 minutes before serving.                                                                                                                                                                                                                                                              +
              | 

6. Register on pgxn!

          git archive --format zip --output pgwaffles-1.0.0.zip master
Image by Ro Ma from Pixabay

6. Register on pgxn!

https://pgxn.org/dist/pgwaffles/1.0.0/
Image by Ro Ma from Pixabay

Time to taste!

Sources available there : https://gitlab.com/l_avrot/pgwaffles/-/tree/master

    Special thanks to
  • Rick Moris for the idea
  • David Avrot for the video
  • My Postgres Friends for their help
Image by Free-Photos from Pixabay